Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms
نویسندگان
چکیده
We develop the method of stochastic modified equations (SME), in which stochastic gradient algorithms are approximated in the weak sense by continuous-time stochastic differential equations. We exploit the continuous formulation together with optimal control theory to derive novel adaptive hyper-parameter adjustment policies. Our algorithms have competitive performance with the added benefit of being robust to varying models and datasets. This provides a general methodology for the analysis and design of stochastic gradient algorithms.
منابع مشابه
Multichannel recursive-least-square algorithms and fast-transversal-filter algorithms for active noise control and sound reproduction systems
In the last ten years, there has been much research on active noise control (ANC) systems and transaural sound reproduction (TSR) systems. In those fields, multichannel FIR adaptive filters are extensively used. For the learning of FIR adaptive filters, recursive-least-squares (RLS) algorithms are known to produce a faster convergence speed than stochastic gradient descent techniques, such as t...
متن کاملNovel Stochastic Gradient Adaptive Algorithm with Variable Length
The goal of this paper is to present a novel variable length LMS (Least Mean Square) algorithm, in which the length of the adaptive filter is always a power of two and it is modified using an error estimate. Unlike former variable length stochastic gradient adaptive techniques, the proposed algorithm works in non-stationary situations. The implementation of the adaptive filter is described and ...
متن کاملStochastic modified equations and the dynamics of stochastic gradient algorithms A Modified equations in the numerical analysis of PDEs
where u : [0, T ] × [0, L] → R represents a density of some material in [0, L] and c > 0 is the transport velocity. It is well-known that the simple forward-time-centralspace differencing leads to instability for all discretization step-sizes (LeVeque, 2002). Instead, more sophisticated differencing schemes must be used. We set time and space discretization steps to ∆t and ∆x and denote u(n∆t, ...
متن کاملA Unified Approach to Adaptive Regularization in Online and Stochastic Optimization
We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning. Such algorithms have been proven useful in stochastic optimization by reshaping the gradients according to the geometry of the data. Our framework captures and unifies much of the existing literature on adaptive online methods, ...
متن کاملADASECANT: Robust Adaptive Secant Method for Stochastic Gradient
Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. In this paper, we propose a new adaptive learning rate algorithm, which utilizes curvature information for auto...
متن کامل